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AMENDMENTS TO THE SPECIFICATION 

Please replace the paragraph spanning page 10, line 31 to page 11, line 16, with the 
following amended paragraph: 

Additionally a "substantially identical" amino acid sequence is a sequence that differs 
from a reference sequence by one or more conservative or non-conservative amino acid 
substitutions, deletions, or insertions, particularly when such a substitution occurs at a site that is not 
the active site of the molecule, and provided that the polypeptide essentially retains its functional 
properties. A conservative amino acid substitution, for example, substitutes one amino acid for 
another of the same class (e.g., substitution of one hydrophobic amino acid, such as isoleucine 
isoleucin , valine, leucine, or methionine, for another, or substitution of one polar amino acid for 
another, such as substitution of arginine for lysine, glutamic acid for aspartic acid or glutamine for 
asparagine). One or more amino acids can be deleted, for example, from an endoglucanase 
polypeptide, resulting in modification of the structure of the polypeptide, without significantly 
altering its biological activity. For example, amino- or carboxyl-terminal amino acids that are not 
required for endoglucanase biological activity can be removed. Modified polypeptide sequences of 
the invention can be assayed for endoglucanase biological activity by any number of methods, 
including contacting the modified polypeptide sequence with an endoglucanase substrate and 
determining whether the modified polypeptide decreases the amount of specific substrate in the 
assay or increases the bioproducts of the enzymatic reaction of a functional endoglucanase 
polypeptide with the substrate. 

Please replace the paragraph spanning page 60, line 10 to page 61, line 18, with the 
following amended paragraph: 

A "comparison window", as used herein, includes reference to a segment of any one of the 
number of contiguous positions selected from the group consisting of from 20 to 600, usually about 50 
to about 200, more usually about 1 00 to about 1 50 in which a sequence may be compared to a 
reference sequence of the same number of contiguous positions after the two sequences are optimally 
aligned. Methods of alignment of sequence for comparison are well-known in the art. Optimal 
alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of 
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Smith & Waterman, Adv. Appl. Math. 2:482, 1981 , by the homology alignment algorithm of 
Needleman & Wunsch, J. Mol. Biol 48:443, 1970, by the search for similarity method of person & 
Lipman, Proc. Natl Acad. Sci. USA 85:2444, 1988, by computerized implementations of these 
algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, 
Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual 
inspection. Other algorithms for determining homology or identity include, for example, in addition to 
a BLAST program (Basic Local Alignment Search Tool at the National Center for Biological 
Information), ALIGN, AMAS (Analysis of Multiply Aligned Sequences), AMPS (Protein Multiple 
Sequence Alignment), ASSET (Aligned Segment Statistical Evaluation Tool), BANDS, 
BESTSCOR, BIOSCAN (Biological Sequence Comparative Analysis Node), BLIMPS (BLocks 
IMProved Searcher), FASTA, Intervals & Points, BMB, CLUSTAL V, CLUSTAL W, 
CONSENSUS, LCONSENSUS, WCONSENSUS, Smith- Waterman algorithm, DARWIN, Las 
Vegas algorithm, FNAT (Forced Nucleotide Alignment Tool), Framealign, Framesearch, 
DYNAMIC, FILTER, FSAP (Fristensky Sequence Analysis Package), GAP (Global Alignment 
Program), GENAL, GIBBS, GenQuest, ISSC (Sensitive Sequence Comparison), LALIGN (Local 
Sequence Alignment), LCP (Local Content Program), MACAW (Multiple Alignment Construction 
& Analysis Workbench), MAP (Multiple Alignment Program), MBLKP, MBLKN, PIMA (Pattern- 
Induced Multi-sequence Alignment), SAGA (Sequence Alignment by Genetic Algorithm) and 
WHAT-IF. Such alignment programs can also be used to screen genome databases to identify 
polynucleotide sequences having substantially identical sequences. A number of genome databases 
are available, for example, a substantial portion of the human genome is available as part of the 
Human Genome Sequencing Project (J. Roach, http://w e ber.u. Washington. e du/ - roach/human_ 
g e nom e ^ progr e ss 2.html) (Gibbs, 1995). At least twenty-one other genomes have already been 
sequenced, including, for example, M. genitalium (Fraser et al 9 1995), M. jannaschii (Bult et aL, 
1996), K influenzae (Fleischmann et aL, 1995), E. coli (Blattner et aL, 1997), and yeast (S. cerevisiae) 
(Mewes et aL, 1997), and D. melanogaster (Adams et aL, 2000). Significant progress has also been 
made in sequencing the genomes of model organism, such as mouse, C. elegans, and Arabadopsis sp. 
Several databases containing genomic information annotated with some functional information are 
maintained by different organization, and are accessible via the interne t, for e xampl e , 
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http://wwwtigr.org/tdb; http://vv\v\v.gen e tics.wisc. e du; http://g e nome www.stanford. e du/ - ball; 
http://hiv w e b.lanl.gov; http://www.ncbi. nlm.nih.gov; http://www. e bi.ac.ulc; 
http://Past e ur.fr/oth e r/biologv; and http:// www.g e nom e .wi.mit. e du . 

Please replace the paragraph spanning page 61, line 19 to page 62, line 09, with the 
following amended paragraph: 

One example of a useful algorithm is BLAST and BLAST 2.0 algorithms, which are 
described in Altschul et al. 9 Nuc. Acids Res. 25:3389-3402, 1977, and Altschul et al, J. Mol. Biol. 
215 :403-410, 1990, respectively. Software for performing BLAST analyses is publicly available 
through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/) . This 
algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of 
length W in the query sequence, which either match or satisfy some positive-valued threshold score T 
when aligned with a word of the same length in a database sequence. T is referred to as the 
neighborhood word score threshold (Altschul et al., supra). These initial neighborhood word hits act as 
seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both 
directions along each sequence for as far as the cumulative alignment score can be increased. 
Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a 
pair of matching residues; always >0). For amino acid sequences, a scoring matrix is used to calculate 
the cumulative score. Extension of the word hits in each direction are halted when: the cumulative 
alignment score falls off by the quantity X from its maximum achieved value; the cumulative score 
goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or 
the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the 
sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as 
defaults a wordlength (W) of 1 1, an expectation (E) of 10, M=5, N=-4 and a comparison of both 
strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and 
expectations (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. 
Acad. Sci. USA 89:10915, 1989) alignments (B) of 50, expectation (E) of 10, M=5, N= -4, and a 
comparison of both strands. 
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Please replace the paragraph of page 63, lines 1 to 12, with the following amended 

paragraph: 

The BLAST programs identify homologous sequences by identifying similar segments, 
which are referred to herein as "high-scoring segment pairs," between a query amino or nucleic acid 
sequence and a test sequence which is preferably obtained from a protein or nucleic acid sequence 
database. High-scoring segment pairs are preferably identified (i.e., aligned) by means of a scoring 
matrix, many of which are known in the art. Preferably, the scoring matrix used is the BLOSUM62 
matrix (Gonnet et al, Science 256:1443-1445, 1992; Henikoff and Henikoff, Proteins 17:49-61, 
1993). Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and 
Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence and 
Structure, Washington: National Biomedical Research Foundation). BLAST programs are 
accessible through the U.S. National Library of Medicine , e.g., at www.ncbi.nlm.nih. gov . 

Please replace the paragraph spanning page 67, line 29, to page 68, line 9, with the 
following amended paragraph: 

Figure 5 is a flow diagram illustrating one embodiment of an identifier process 300 for 
detecting the presence of a feature in a sequence. The process 300 begins at a start state 302 and 
then moves to a state 304 wherein a first sequence that is to be checked for features is stored to a 
memory 1 15 in the computer system 100. The process 300 then moves to a state 306 wherein a 
database of sequence features is opened. Such a database would include a list of each feature's 
attributes along with the name of the feature. For example, a feature name could be "Initiation 
Codon" and the attribute would be "ATG". Another example would be the feature name 
"TAATAA Box" and the feature attribute would be "TAATAA". An example of such a database is 
produced by the University of Wisconsin Genetics Computer Group (www.gcg.com) . 
Alternatively, the features may be structural polypeptide motifs such as alpha helices, beta sheets, or 
functional polypeptide motifs such as enzymatic active sites, helix-turn-helix motifs or other motifs 
known to those skilled in the art. 
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